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1.  Introduction 


Whether  the  task  is  to  find  an  individual  in  a  packed  stadium  or  navigate  around 
obstacles  while  driving,  visual  searching  is  a  key  task  that  our  visual  systems 
perform  daily.  Researchers  are  investigating  how  visual  systems  perform  tasks  like 
these  and  they  seek  to  use  that  knowledge  to  develop  new  image  processing 
techniques.  Feature  integration  theory  states  that  there  is  a  preattentive  and  attentive 
process  to  visual  perception  of  the  surrounding  environment  or  displayed  scene. 
''Preattentive  processing  of  visual  information  is  performed  automatically  on  the 
entire  visual  field  detecting  basic  features  of  objects  in  the  display.  Such  basic 
features  include  colors,  closure,  line  ends,  contrast,  tilt,  curvature,  and  size.  These 
simple  features  are  extracted  from  the  visual  display  in  the  preattentive  system  and 
later  joined  in  the  focused  attention  system  into  coherent  objects.  Preattentive 
processing  is  done  quickly,  effortlessly  and  in  parallel  without  any  attention  being 
focused  on  the  display.”'  The  attentive  or  focused  attention  process  combines 
individual  features  for  object  recognition. 

Within  the  preattentive  process,  different  areas  of  a  scene  are  not  considered 
equally,  some  areas  draw  an  individual’s  attention  more  than  others.  The  areas  that 
draw  more  visual  attention  have  more  visually  salient  features.  Salient  features 
cause  areas  within  a  scene  to  “pop-out”  or  draw  an  individual’s  attention 
immediately  (Fig.  1).^  Visual  salience  is  a  bottom-up,  stimulus-driven  component 
of  attention  that  is  linked  to  the  features  within  a  scene;  whereas,  the  top-down 
component  is  driven  by  the  intentions  and  expectations  of  the  person. 
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Fig.  1  Examples  of  salient  features  in  artificial  scenes 

As  effective  as  the  preattentive  process  is,  some  situations  can  make  this  process 
more  difficult.  For  example,  it  is  more  difficult  to  distinguish  a  dull  yellow  daffodil 
among  a  field  of  dull  yellow  dandelions  versus  finding  a  bright  red  rose  in  that  same 
field.  The  human  eye  is  directed  to  particular  regions  in  a  scene  by  highly  salient 
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features,  for  example,  the  color  of  the  flower  discussed  in  the  previous  example. 
These  areas  of  interest  compete  for  the  viewer’s  attention  as  scenes  become  more 
complex.  Physical  differences  in  the  visual  system  of  2  individuals  can  also  lead  to 
differences  in  attention.  For  example,  if  a  person  is  colorblind,  there  may  be 
differences  in  what  draws  their  attention  in  comparison  to  someone  who  is  not 
colorblind  due  to  differences  in  what  is  salient  between  the  2  individuals.  There  are 
many  other  factors  that  influence  attention  such  as  center  bias,  subjective  image 
selection,  image  resolution,  and  a  person’s  goals. In  this  study,  however,  we 
focus  on  the  possible  influences  of  image  resolution  on  saliency. 

Modeling  visual  saliency  helps  researchers  understand  and  predict  where  a  person 
will  look  within  a  scene.  Some  models  attempt  to  replicate  the  physical  structure  of 
the  human  visual  system  that  controls  an  individual’s  gaze.  Other  models  create 
techniques  based  on  the  function  and  behavior  of  the  visual  system  that  influences 
what  directs  an  individual’s  gaze.  In  general,  models  use  low-level  features  such  as 
color,  intensity,  and  orientation  to  generate  saliency  maps.  In  addition,  higher-order 
statistics  have  been  exploited  to  enhance  the  predictive  power  of  saliency-based 
models,  but  the  extent  at  which  they  are  effective  still  remains  under  investigation.^ 
These  types  of  models  can  be  used  for  a  wide  variety  of  tasks,  for  example, 
navigational  assistance,  object  recognition,  and  even,  system  design. 

The  model  used  in  this  investigation  is  a  wavelet,  entropy-based,  saliency  ideal 
observer  model  (lOM).  It  does  not  require  training  and  relies  solely  on  natural  scene 
statistics.  The  lOM  employs  a  bottom-up  approach  to  select  salient  areas  within  an 
image.  ^ 

Our  investigation  stems  from  a  general  inquiry  into  resolution  and  its  effects  on 
preattentive  salient  features.  For  this  investigation,  the  image  resolution  is  varied 
systematically  to  explore  its  influence  on  identifying  salient  features  within  images. 
We  hypothesize  that  resolution  could  be  a  factor  in  the  locations  that  the  lOM 
chooses  as  interesting  or  salient.  Figure  2  shows  an  example  of  the  saliency  map 
used  in  this  investigation. 
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Fig.  2  A  saliency  map  is  a  topographical  approach  to  displaying  ranges  of  saliency  for  an 
image.  These  saliency  maps  were  constrncted  for  a  low  (right)  and  high  (left)  resolntion 
version  of  the  same  image.  According  to  the  lOM  ontpnt,  the  low  resolntion  image  has  a  mnch 
wider  diffnsion  of  attention  than  the  high  resolntion  one. 

2.  Methods 


Ten  images  from  the  Massaehusetts  Institute  of  Teehnology  (MIT)  Computer 
Seienee  and  Artifieial  Intelligenee  Laboratory  Database  of  Objeets  and  Seenes^ 
were  used  for  this  pre-pilot  study.  These  images  were  of  unique  natural  seenes  with 
a  variety  of  eompositions  and  subjeets.  Two  artifieial  grayseale  seenes  were 
ineluded  as  well.  The  images  were  then  bilinearly  downsampled  into  4  additional 
resolutions:  75%,  50%,  25%,  and  18.75%.  The  18.75%  resolution  was  set  as  the 
lower  boundary  due  to  the  lOM  minimum  size  limitation  for  input  images.  The 
bilinear  downsample  was  done  is  sueh  a  way  that  aliasing  was  not  present  in  the 
final  image.  A  data  set  eomprised  of  the  top  20  areas  of  interest  that  the  lOM 
seleeted  from  eaeh  image  resolution  was  obtained.  Sixty  data  sets  were  obtained 
and  prepared  for  analysis  by  reeording  the  eoordinates  for  the  salient  areas  of  eaeh 
image  in  a  spreadsheet  to  aide  in  the  ealculations.  The  salient  loeations  for  eaeh 
resolution  were  resealed  to  the  dimensions  of  the  50%  resolution  image  in  order  to 
provide  a  eommon  resolution.  The  resealing  linearly  expands  or  shrinks  the  pixel 
loeations  of  salient  areas  to  mateh  the  dimensions  of  the  50%  resolution  image. 
Seatter  plots  provide  a  visual  representation  of  the  eoordinates  of  the  salient  areas 
for  eaeh  of  the  5  image  resolutions  (Fig.  3). 

A  eluster  is  defined  as  a  group  of  points  that  share  a  eoordinate  set  with  a  deviation 
limit  of  ±5.  Efforts  were  made  to  review  eaeh  plot  to  determine  potential  issues 
with  the  data  eolleeted.  Salient  areas  that  were  different  for  the  same  image  set  or 
similar  aeross  different  images  were  highlighted.  If  an  error  oeeurred  using  these 
data  sets  in  the  lOM,  the  resolution  was  eheeked  and  the  proeedure  repeated. 
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Fig.  3  Sample  image  shown  in  4  resolntions  in  descending  order  (a-e).  The  plot  compiles 
the  areas  of  interest  displayed  in  the  images  and  each  symbol  represents  1  of  the  images.  Data 
clnsters  indicate  where  the  lOM  identified  areas  of  interest  across  all  resolntions.  The  circles 
were  snperimposed  over  the  data  clnsters  for  emphasis. 


3.  Results  and  Discussion 


When  the  salient  areas  highlighted  by  the  lOM  were  overlaid  onto  the 
eorresponding  images,  the  results  showed  a  number  of  clusters  as  well  as  non¬ 
clustered  areas.  One  of  the  first  steps  of  the  lOM  is  decomposing  the  source  image 
into  smaller  samples  of  the  image.  Therefore,  inherently,  the  lOM  does  not  process 
an  image  differently  at  varying  resolutions.  However,  high  resolutions  images  can 
be  downsampled  more.  The  results  indicate  that  because  of  the  lower  resolution  the 
lOM  processed  images  differently.  A  question  for  future  investigation  would  be 
exploring  the  decomposition  of  the  original  image  to  determine  if  other  factors  were 
involved.  Had  resolution  been  a  non-factor  of  lOM-replicated  saliency,  the  graphs 
should  show  overlapping  clusters.  The  number  of  clusters  varied  among  image  sets. 
The  lowest  resolution  data  sets  all  displayed  a  lattice-like  pattern  in  identified  areas 
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of  interest,  whereas  the  higher  resolution  data  sets  showed  areas  of  interest  more 
elosely  eoneentrated  around  salient  areas.  The  artifieial  controls  showed  a  similar 
effect  to  the  natural  images. 

The  study  results  suggest  that  resolution  is  an  influential  factor.  Major  observations 
included  salient  areas  that  did  not  cluster  as  resolution  changed  and  salient  areas 
appeared  in  uniformed  patterns  as  the  resolution  of  the  image  decreased.  It  is 
unclear  if  the  treatment  of  neighboring  pixels  or  processing  of  images  of  varying 
size  might  contribute  to  the  lattice  like  pattern  of  the  results  from  the  lower 
resolution  data  sets.  Also,  the  different  percentages  of  the  resolution  selected  may 
have  influenced  the  effect  of  resolution  change  in  the  lOM. 

4.  Conclusion  and  Future  Work 


Given  the  variation  within  the  data  sets,  these  initial  results  indicate  there  may  be  a 
shift  between  the  areas  of  interest  identified  by  the  lOM  as  an  image  decreases  in 
resolution.  As  resolutions  decrease,  lOM-identified  areas  of  interest  appeared  to 
progressively  change  from  highly  salient  areas  found  at  higher  resolutions;  salient 
areas  in  the  higher  resolution  images  seemed  more  likely  to  cluster.  However,  other 
features  may  play  more  influential  roles  than  changes  in  resolution. 

The  findings  suggest  that  resolution  may  have  an  effect  on  overall  preattention 
simulation  in  the  lOM.  The  results  discussed  were  a  part  of  an  initial  pre -pilot  study 
only;  additional  studies  would  be  needed  to  further  investigate  the  trends  found  in 
these  results,  such  as  the  inconsistencies  in  the  clusters  throughout  the  images  sets 
that  may  indicate  underlying  confounding  variables.  Also,  additional  investigation 
can  explore  the  relationship  between  clustering  of  salient  areas  and  the  rank  the 
lOM  places  on  the  salient  areas. 

Further  exploration  of  the  idea  initiated  in  this  study  may  reveal  new  information 
concerning  the  degree  of  influence  resolution  has  on  traditional  preattentive 
features  and  potentially  improve  image  processing  techniques.  Additional  research 
using  larger  quantities  of  images  with  a  wider  variety  of  compositions  to  investigate 
the  relationship  between  resolution  and  other  salient  features  would  be 
recommended.  A  practical  next  step  would  be  to  differentiate  between  size  and 
pixel  density  as  an  added  definition  to  the  resolution  factor.  To  more  thoroughly 
test  this  proposition,  comparative  studies  of  resolution  variations  in  other  saliency 
models  as  well  may  be  conducted.  This  may  provide  further  evidence  to  support 
resolution  influence  if  other  models  display  trends  similar  to  the  lOM’s.  Including 
ground  truth  of  human  visual  data  from  studies  where  saccade  positions  were 
measured  when  subjects  were  presented  imagery  with  varying  resolutions  may  also 
aid  further  research. 
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